Deqing Fu
This is Deqing Fu and I’m a fourth-year Ph.D. candidate in Computer Science at the University of Southern California (USC). My main research interests are deep learning theory, natural language processing, and the interpretability of AI systems. I’m (co-)advised by Prof. Vatsal Sharan of USC Theory Group and Prof. Robin Jia of Allegro Lab within USC NLP Group, and I’m working closely with Prof. Mahdi Soltanolkotabi and Prof. Shang-Hua Teng. Before USC, I completed my undergraduate degree in Mathematics (with honors) and my master’s in Statistics at the University of Chicago.
My research focuses on understanding large language models from algorithmic and theoretical perspectives, as well as developing methods for multimodal learning and synthetic data generation. You can find my publications on Google Scholar and my recent CV here
Algorithmic Perspectives on Large Language Models
- Transformers and in-context learning: does it implement gradient descent? (NeurIPS 2024)
- Arithmetic in pretrained LLMs: memorization vs. mechanisms? (NeurIPS 2024, arXiv 2025)
- What distinguishes Transformers from other architectures? (ICLR 2025)
- Decision theory for LLM reasoning under uncertainty (ICLR 2025 Spotlight)
Synthetic Data and Multimodal Learning
- Modality sensitivity in Multimodal LLMs (COLM 2024)
- VLM feedback for Text-to-Image generation (NAACL 2025)
- Token-level reward models (TLDR) for reducing hallucinations (ICLR 2025)
News
| Oct 22, 2025 | New preprint: When Do Transformers Learn Heuristics for Graph Connectivity? ArXiv. |
|---|---|
| Jul 23, 2025 | Three new preprints (Multimodal Steering, Resa, and Zebra-CoT) |
| Jul 22, 2025 | Check out our new paper Zebra-CoT on interleaved text and visual reasoning! Dataset and model are available on Hugging Face 🤗. |
| May 22, 2025 | Talk at Stanford NLP Seminar. Slides here. |
| May 08, 2025 | Talk at UChicago/TTIC NLP Seminar. |
Selected Publications
See full list or Google Scholar for all publications.
2025
- arXiv
When Do Transformers Learn Heuristics for Graph Connectivity?In arXiv, 2025*Equal Contribution - ICLR
TLDR: Token-Level Detective Reward Model for Large Vision Language ModelsIn International Conference on Learning Representations (ICLR), 2025 - ICLR
Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsIn International Conference on Learning Representations (ICLR), 2025*Equal Contribution - NAACL
DreamSync: Aligning Text-to-Image Generation with Image Understanding FeedbackIn Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025*Equal Contribution
2024
- NeurIPS
Pre-trained Large Language Models Use Fourier Features to Compute AdditionIn Conference on Neural Information Processing Systems (NeurIPS), 2024
Dataset
Code